Minimum mean square error

In statistics and signal processing, a minimum mean square error (MMSE) estimator describes the approach which minimizes the mean square error (MSE), which is a common measure of estimator quality.

The term MMSE specifically refers to estimation in a Bayesian setting, since in the alternative frequentist setting there does not exist a single estimator having minimal MSE. A somewhat similar concept can be obtained within the frequentist point of view if one requires unbiasedness, since an estimator may exist that minimizes the variance (and hence the MSE) among unbiased estimators. Such an estimator is then called the minimum-variance unbiased estimator (MVUE).

1 Definition
2 Properties
3 Example
4 See also
5 Notes
6 Further reading

Definition

Let $X$ be an unknown random variable, and let $Y$ be a known random variable (the measurement). An estimator $\hat{X}(y)$ is any function of the measurement $Y$ , and its MSE is given by

$\mathrm{MSE} = E \left\{ (\hat{X} - X)^2 \right\}$

where the expectation is taken over both $X$ and $Y$ .

The MMSE estimator is then defined as the estimator achieving minimal MSE.

In many cases, it is not possible to determine a closed form for the MMSE estimator. In these cases, one possibility is to seek the technique minimizing the MSE within a particular class, such as the class of linear estimators. The linear MMSE estimator is the estimator achieving minimum MSE among all estimators of the form $A Y %2B b$ . If the measurement $Y$ is a random vector, $A$ is a matrix and $b$ is a vector. (Such an estimator would more correctly be termed an affine MMSE estimator, but the term linear estimator is widely used.)

Properties

Under some weak regularity assumptions,^[1] the MMSE estimator is uniquely defined, and is given by

$\hat{X}_{\mathrm{MMSE}}(y) = E \left\{ X | Y=y \right\}.$

In other words, the MMSE estimator is the conditional expectation of $X$ given the observed value of the measurements.

If $X$ and $Y$ are jointly Gaussian, then the MMSE estimator is linear, i.e., it has the form $aX%2Bb$ for constants $a$ and $b$ . As a consequence, to find the MMSE estimator, it is sufficient to find the linear MMSE estimator. Such a situation occurs in the example presented in the next section.

The orthogonality principle: An estimator $\hat{X}$ is MMSE if and only if

$E \{ (\hat{X}-X) f(Y) \} = 0$

for all functions $f(Y)$ of the measurements. A different version of the orthogonality principle exists for linear MMSE estimators.

Example

An example can be shown by using a linear combination of random variable estimates $X_{1}, X_{2}$ and $X_{3}$ to estimate another random variable $X_{4}$ using $\hat X_{4}.$ If the random variables $X=[X_{1}, X_{2},X_{3},X_{4}]^{T}$ are real Gaussian random variables with zero mean and covariance matrix given by

$\operatorname{cov}(X)=E[XX^{T}]=\left[\begin{array}{cccc} 1 & 2 & 3 & 4\\ 2 & 5 & 8 & 9\\ 3 & 8 & 6 & 10\\ 4 & 9 & 10 & 15\end{array}\right],$

we will estimate the vector $X_{4}$ and find coefficients $a_{i}$ such that the estimate $\hat X_{4}=\sum_{i=1}^{3}a_{i}X_{i}$ is an optimal estimate of $X_{4}.$ We will use the autocorrelation matrix, R, and the cross correlation matrix, C, to find vector A, which consists of the coefficient values that will minimize the estimate. The autocorrelation matrix $R$ is defined as

$R=\left[\begin{array}{ccc} E[X_{1},X_{1}] & E[X_{2},X_{1}] & E[X_{3},X_{1}]\\ E[X_{1},X_{2}] & E[X_{2},X_{2}] & E[X_{3},X_{2}]\\ E[X_{1},X_{3}] & E[X_{2},X_{3}] & E[X_{3},X_{3}]\end{array}\right]=\left[\begin{array}{ccc} 1 & 2 & 3\\ 2 & 5 & 8\\ 3 & 8 & 6\end{array}\right].$

The cross correlation matrix $C$ is defined as

$C=\left[\begin{array}{c} E[X_{4},X_{1}]\\ E[X_{4},X_{2}]\\ E[X_{4},X_{3}]\end{array}\right]=\left[\begin{array}{c} 4\\ 9\\ 10\end{array}\right].$

In order to find the optimal coefficients by the orthogonality principle we solve the equation $RA=C$ by inverting $R$ and multiplying to get

$R^{-1}C=\left[\begin{array}{ccc} 4.85 & -1.71 & -.142\\ -1.71 & .428 & .2857\\ -.142 & .2857 & -.1429\end{array}\right]\left[\begin{array}{c} 4\\ 9\\ 10\end{array}\right]=\left[\begin{array}{c} 2.57\\ -.142\\ .5714\end{array}\right]=A.$

So we have $a_{1}=2.57,$ $a_{2}=-.142,$ and $a_{3}=.5714$ as the optimal coefficients for $\hat X_{4}.$ Computing the minimum mean square error then gives $\left\Vert e\right\Vert _{\min}^{2}=E[X_{4}X_{4}]-C^{T}A=15-C^{T}A=.2857$ .^[2]

A shorter, non-numerical example can be found in orthogonality principle.

Notes

^ Lehmann and Casella, Corollary 4.1.2.
^ Moon and Stirling.

Minimum mean square error

Contents

Definition

Properties

Example

See also

Notes

Further reading